NATIONAL 


CHARTER 

SCHOOL 


r ' 


L J 


RESEARCH 


PROJECT 



center on reinventing public education 



Key Issues in Studying Charter Schools and 
Achievement: A Review and Suggestions for 

National Guidelines 

4 

■' Tlie Charter School Achievement Consensus Panel 
Principal Drafters: Julian Betts and Paul T. Hill 



NCSRP White Paper Series, No. 2 
May 2006 



center on reinventing public education 

Daniel J. Evans School of Public Affairs/University of Washington 
2101 North 34th Street, Suite 195, Seattle, WA 98103 206.685.2214 www.ncsrp.org 





THE CHARTER SCHOOL ACHIEVEMENT CONSENSUS PANEL 



Julian Betts 

Professor, Department of Economics, University of California, San Diego, CA, and Senior Fellow, 
Public Policy Institute of California 



Dominic J. Brewer 

Professor of Education and Policy, Planning and Development, University of Southern California, 
Los Angeles, CA, and Co-Director, Center on Educational Governance 



Anthony Bryk 

Professor, Spencer Chair in Organizational Studies, School of Education and Graduate School of Business, 
Stanford University, Stanford, CA, and Founding Director, Consortium on Chicago School Research 



Dan Goldhaber 

Research Associate Professor, Daniel J. Evans School of Public Affairs and Center on Reinventing 
Public Education, University of Washington, Seattle, WA, and Affiliated Scholar, Urban Institute’s 
Education Policy Center 



Laura Hamilton 

Senior Behavioral Scientist, The RAND Corporation, Pittsburgh, PA 



Jeffrey R. Henig 

Professor, Political Science and Education, Teachers College, Columbia University, New York, NY 



PaulT. Hill 

Director, Center on Reinventing Public Education, and Research Professor, Daniel J. Evans School of 
Public Affairs, University of Washington, Seattle, WA 



Susanna Loeb 

Associate Professor of Education, Stanford University, Stanford, CA, and Co-Director, Policy 
Analysis for California Education 



Patrick McEwan 

Assistant Professor, Department of Economics, Wellesley College, Wellesley, MA 



NATIONAL 


CHARTER 

SCHOOL 


r ' 




RESEARCH 


PROJECT 



center on reinventing public education 



Key Issues in Studying Charter Schools and 
Achievement: A Review and Suggestions for 

National Guidelines 

J 



Hie Charter School Achievement Consensus Panel 
Principal Drafters: Julian Betts and Paul T. Hill 



NCSRP White Paper Series, No. 2 
May 2006 



center on reinventing public education 

Daniel J. Evans School of Public Affairs/University of Washington 
2101 North 34th Street, Suite 195, Seattle, WA 98103 206.685.2214 www.ncsrp.org 





ABOUT NCSRP, THE CHARTER SCHOOL ACHIEVEMENT 
CONSENSUS PANEL, &THIS WHITE PAPER 



The National Charter School Research Project (NCSRP) aims to bring rigor, evi- 
dence, and balance to the national charter school debate. Its goals are to 1) facilitate 
the fair assessment of the value-added effects of U.S. charter schools, and 2) provide 
the charter school and broader public education communities with research and 
information for ongoing improvement. 

In early 2005, NCSRP convened a national consensus panel to evaluate current 
research on charter school effectiveness and develop standards for future research. 
The goal of the Charter School Achievement Consensus Panel is to improve the 
quality of future charter school research. Secondary goals include influencing the 
kinds of studies that receive funding and helping the media to both understand the 
complexities of charter school research and properly interpret study results. 

The Consensus Panel includes outstanding researchers from different methodologi- 
cal traditions — sociology, economics, psychometrics, and political science — who 
despite differing views on charter schools all agree on the importance of improv- 
ing research. Members include Julian Betts, University of California, San Diego; 
Dominic Brewer, University of Southern California; Anthony Bryk, Stanford 
University; Dan Goldhaber, University of Washington; Laura Hamilton, RAND; 
Jeffrey Henig, Columbia University; Paul Hill, University of Washington; Susanna 
Loeb, Stanford University; and Patrick McEwan, Wellesley College. 

This White Paper is the first in a series of reports from the consensus panel, all of 
which will be concerned with assessing and strengthening the evidence about char- 
ter school outcomes. This report is based on the Consensus Panel’s deliberations and 
incorporates ideas and phrasing contributed by panel members. The line of argument 
is the Panel’s, but choices about the organization of this White Paper and the illus- 
trations used in it were made by drafters Betts and Hill. 

For more information and research on charter schools, please visit the NCSRP web- 
site at www.ncsrp.org. Original research, state-by-state charter school data, and 
links to charter school research from many sources can be found there. 



CONTENTS 



EXECUTIVE SUMMARY 1 

STUDYING CHARTER SCHOOLS & ACHIEVEMENT 9 

Charter School Research: What Are the Key Policy Questions? 9 

Researchers Will Have to Use Multiple Research Approaches to Learn the Impact of Charter Schools 10 

An Example of the Potential Weaknesses of School-Level Studies Relative to Student Value-Added Models 15 

A Summary of the Research Methods Used by the Research Literature to Date 20 

To Randomize or Not? A Tentative Conclusion 22 

Data Challenges Associated with Different Forms of Analysis 23 

Consequences of Poor Data 27 

Implications of Design & Data Problems for the Relative Merits of National Versus Regional & Local Studies 28 
Making the Most of Imperfect Data 29 

Implications for States, Localities, Research Funders, and Media 32 

APPENDICES 35 

List of Charter School Studies Included in Literature Review 35 

Details on the Literature on Charter Schools 39 

TABLES & FIGURES 

TABLE 1: Example of a Value-Added Dataset with Students’ Percentile Rankings by Year 15 

TABLE 2: Achievement Levels of Each Student in Table 1 by Year and Grade 15 

FIGURE 1: Misleading Trends in Average Achievement in Charter and Regular Public Schools 

Based on Average Scores 16 

FIGURE 2: Number of Charter School Achievement Studies, Total and by Research Method, 

for the Periods 2001 -2003 and 2001 -2005 2 1 

FIGURE 3: The Geographic Scope of Studies and Quality Rankings for the 

Research Methods Adopted in Research Produced Between 2001 and 2005 21 

TABLE 3: A Hierarchy of Data Needs for Three Levels of Research on Charter Schools and Achievement 25 

APPENDIX TABLE 1: Summary of Research on the Effect of Charter Schools on Attendees’ Achievement, 

Covering Research Released Between 2001 and 2005 40 

APPENDIX TABLE 2: Sources of Studies of Each Type 41 

APPENDIX TABLE 3: Actual vs. Ideal Data for Charter School Research 42 



“...evaluation of all types of schools, charter and others, could be improved 
both by accounting for the difficulty of educating particular groups of 
students before interpreting test scores and by focusing on student gains 
over time, not their level of achievement in any particular year. ” 

Economic Policy Institute 



EXECUTIVE SUMMARY 



E veryone wants to know how charter 
schools are doing. There is a growing 
body of research about how charter 
school students perform on tests, 
but the results have been mixed and 
some studies have sparked bitter controversy. The 
Charter School Achievement Consensus Panel set 
out to understand the strengths and weaknesses of 
the research done to date and to suggest how future 
research could be more definitive. 

Our first step was to make sure we asked the right 
question. It does not make much sense simply to ask 
whether the average child in a charter school is learn- 
ing more or less than the average child in a district- 
run public school, because there are probably many 
factors other than the quality of school programs that 
could cause differences in results. These factors could 
include parents’ education and income, the kinds of 
neighborhoods in which children live, and children’s 
own native abilities and prior educational experienc- 
es. School-wide averages also reveal nothing about 
whether all students achieve at about the same level 
or whether some students are achieving a great deal 
more than others. 

The right question is whether students in charter 
schools are learning more or less than they would 
have learned in conventional public schools. This is 
a reasonable question, but it is easier to ask than to 
answer for three reasons. 



First, it is impossible to observe the same students si- 
multaneously in both charter schools and the schools 
they would have attended had charter schools not been 
available. Thus, it is necessary to create a “counterfac- 
tual,”an approximation to something that never really 
occurred. Researchers have approximated this coun- 
terfactual by comparing students in charter schools 
with other students who are similar in some ways but 
do not attend charter schools. Another method that 
researchers have used is to compare the achievement 
gains of individual students before and after they 
switch between charters and regular public schools. 

Second, there are many kinds of charter schools — 
some serving the poor and disadvantaged and oth- 
ers serving the advantaged; some receiving the same 
amount of money as nearby public schools and others 
much less; and some in supportive local environments 
and others constantly fighting off attacks from their 
local school districts and teachers’ unions. Because 
differences among charter schools might be related to 
differences in results, it is necessary to be very clear 
about exactly what kind of charter school the students 
in a study are attending. The results of studies focus- 
ing on one kind of charter school cannot be general- 
ized to all charter schools. 

Third, student achievement is affected by many non- 
school factors, such as the influence of parents and 
peers. Studies that attempt to isolate the effect of 
a student’s attendance at a charter school must use 
statistical methods that try to eliminate anything as- 
sociated with the other factors. These methods are 



1 



more or less effective depending on the quality of 
data available and on the numbers of students tested. 
Even the best methods are predisposed to find “no 
school effects” if sample sizes are small or the results 
are highly variable. 

Much of the Consensus Panel’s work has been to con- 
sider the strengths and weaknesses of different meth- 
ods of making comparisons that approximate the 
“counterfactual” of students’ simultaneously attending 
charter schools and other schools. We rated alterna- 
tive methods according to two criteria: 

► How well the methods eliminated extrane- 
ous factors (e.g., differences in students’ race 
or income) so that any difference in perfor- 
mance could be clearly attributed to stu- 
dents’ attendance at charter schools. Social 
scientists call this criterion internal validity. 

► Whether the schools studied represent all 
charter schools and charter school students or 
a special isolated subset, either of the schools 
themselves or of the types of students who at- 
tend charter schools. Social scientists call this 
criterion external validity. 

These two criteria are demanding because it can be 
difficult to satisfy them both at once. It is easier to 
achieve internal validity if the researcher has a great 
deal of information about the schools and students 
studied and can be sure there are no hidden factors 
that could amplify or work against the effects of stu- 
dents’ charter school experience. However, situations 
that enhance internal validity are often special and 
unrepresentative, thus reducing external validity. 

POSSIBLE METHODS FOR 
CHARTER SCHOOL RESEARCH 

The Consensus Panel reviewed several different meth- 
ods used to study charter school achievement. 



The experimental method involves comparing the 
scores of students attending charter schools with 
those of students who applied to the same schools 
but did not get in because all the seats were taken. If 
admissions to over-enrolled charter schools were de- 
termined by fair lotteries, the non-admitted students 
could be considered a random sample of the school’s 
applicant pool. Comparing the scores of admitted 
and non-admitted students would approximate the 
results of a controlled experiment, in which research- 
ers randomly selected students to attend or not attend 
a charter school. 

The experimental method is often not feasible, for 
example when charter schools are not over-enrolled 
or when admissions are not at random. If research- 
ers cannot implement the experimental method, they 
must instead use “observational” methods that at- 
tempt to create a counterfactual indicating how char- 
ter school students would have performed if they had 
not attended charter schools. There are five possible 
non-experimental methods: 

► Comparing average scores in charter versus 
non-charter schools, based on one year’s test 
results. 

► Comparing trends over two or more 
years in school-wide average test scores. 

► Comparing scores for individual students in 
charter versus non-charter schools, based on 
one year’s test results, and taking account of 
a few individual student characteristics (e.g., 
race). 

► Comparing trends in individual students’ test 
scores in charter versus non-charter schools 
over two or more years, and taking account of 
some individual student characteristics (e.g., 
race). 

► Using individual students’ test scores before 
and after entering charter schools, in order to 
judge whether students’ learning rates were 



2 



higher or lower in charter than in non-char- 
ter schools . 1 

This list simplifies the actual range of research meth- 
ods available. Although these five methods dominate 
the “observational” studies, there are various flavors. 
Some of the student-level studies have implemented 
methods designed to create comparison groups of 
students in regular public schools that resemble char- 
ter school enrollees using all the observable charac- 
teristics of students. The Panel judged such methods, 
which are really a variant of the fourth method listed 
here, to potentially have very good internal validity, in 
line with that of the fifth method. 

WHAT METHODS ARE BEST 

In theory, the experimental method can provide the 
greatest internal validity, because it compares students 
who are identical in all ways except for their enroll- 
ment in a charter school, which was decided by lot- 
tery. However, because many charter schools are not 
over-enrolled and do not fill all their seats via rigorous 
lotteries, experimental studies are limited to a subset 
of all charter schools. Thus, experimental studies are 
often low in external validity, since it is not clear how 
representative of charter schools in general they can 
be. 

The non-experimental methods cannot produce re- 
sults with internal validity as high as the experimental 
method. But the best of them can have good to very 
good internal validity and, because they can often en- 
compass a greater variety of charter schools, they can 
have greater external validity than the experimental 
method. External validity depends on whether the 
sample of schools and students studied closely re- 
sembles the whole population of schools to which the 
results will be attributed. A study that focuses on a 
very small sample of schools, a very particular student 
population (such as would be attracted by a charter 



1. The full text of the White Paper explains the different ways data 
collected for a study using this method can be analyzed. 



school specializing in, say, the dramatic and perform- 
ing arts), or on schools with a unique geographic lo- 
cation or other attribute cannot tell us much about 
charter schools in general. 

There are also huge differences in internal valid- 
ity among the non-experimental methods. Methods 
that compare only one year’s test results cannot reveal 
whether the students in charter schools have different 
educational histories — higher or lower achievement 
in earlier grades, or greater or lesser trouble adapting 
to school — than children in the regular public schools 
to which they are being compared. These factors can- 
not be controlled for by proxy variables like race or 
income, since students’ educational histories are per- 
sonal, not group characteristics. Thus, studies using 
one-year snapshots of achievement cannot have high 
internal validity, no matter how large a database they 
draw from or how carefully the analysis is done. 

Further, methods that control for few student char- 
acteristics cannot provide any assurance that the 
students in charter schools are truly similar to the 
students in regular public schools to whom they are 
being compared. No one can know for sure wheth- 
er these comparisons bias the analysis in favor of or 
against charter schools, so these studies cannot have 
high internal validity. The fourth method outlined 
here, which uses trends in student test scores, can 
have moderately high internal validity, depending on 
the amount of evidence that the students in charter 
and non-charter schools are truly similar. The best 
research designs of this type use one of various meth- 
ods to attempt to match students who attend charter 
schools with students in conventional public schools 
who resemble them along multiple dimensions. 

Only the fifth method listed above, which compares 
individual students’ scores before and after enrolling 
in charter schools, can be considered high in inter- 
nal validity. However, methods that track changes 
in individual students’ scores over time are possible 
only in those states and localities that use the same 
kind of test for a long time and keep individual test 
score data. The latter methods also have problems of 
external validity because they cannot be used to assess 



3 



charter school effects for students in early grades with 
no test score history, or for students who never switch 
between charter and non-charter schools. 

Since no one method is problem free, the only option 
is for researchers to use the best methods available to 
them and make sure the limitations of their results 
are evident. Moreover, since no one study can answer 
all questions, the research community and other audi- 
ences will need to consider the pattern of results from 
multiple studies, rather than relying on one definitive 
result. 

VALIDITY OF METHODS 
USED IN CHARTER SCHOOL 
RESEARCH TO DATE 

What does the existing work on charter schools and 
student achievement look like in terms of research 
methods? The figure below summarizes research on 
charter schools and student achievement completed 
during the periods 2001-2003 and 2001-2005, cate- 
gorized according to the research methods used, with 
the higher- quality methods at the bottom. 



As the figure shows, the total number of studies has 
roughly tripled between the periods 2001-2003 and 
2001-2005. 

The next figure shows the geographic scope of the 
studies to date, and, within each geographic level, 
the Consensus Panel’s general quality rankings of 
the research designs used. Studies of multiple states 
or nationwide studies, at least to date, have not used 
methods rated good or very good. This highlights 
the difficulty of doing research that pools schools or 
students across multiple states. To date, district-wide 
studies of charter schools have not lived up to their 
potential, mainly because the analysis has proceeded 
at the school level. In contrast, a surprisingly high 
proportion of state-level studies have used good or 
very good designs that focus on individual students, 
thus avoiding the compositional problems discussed 
earlier. 

Is the quality of studies improving? Slightly, since we 
now have two studies that use experimental methods. 
But the share of studies using weaker research designs 
declined only marginally between the 2001-2003 and 
2001-2005 periods, falling from 64 percent to 61 
percent. 



Number of Charter School Achievement Studies, Total and by Research Method, for the 
Periods 2001-2003 and 2001-2005 




■ 2001-2005 □ 2001-2003 



4 



QUALITY OF STUDIES IS 
LIMITED BY DATA AVAILABILITY 

Researchers in the real world typically must either 
make do without some vital information, or cobble 
together imperfect substitutes for it. Neither alterna- 
tive is very good. 

Studies are handicapped by an absence of standard 
outcome data. Studies that include two or more states 
that use different tests must calculate the equivalency 
between a score on one test versus another — an error- 
filled process that can create false comparisons. Com- 
bined with weak data on student attributes — which 
can make dissimilar students look alike and similar 
students look different — non-comparable test data 
can wreck efforts to compare performance of students 
from different schools. 



Efforts to learn what distinguishes effective from less 
effective charter schools are doubly burdened, by poor 
data on student attributes and outcomes and by weak 
information about the schools themselves. When 
information is scarce about factors that often distin- 
guish strong from ineffective schools — for example, 
financial stability, leadership turnover, teacher attri- 
tion, existence of a reliable parent clientele — it is pos- 
sible to observe but not explain variations in school 
performance. 

RELATIVE MERITS OF 
NATIONAL VERSUS REGIONAL 
AND LOCAL STUDIES 

In theory, national studies should provide the best an- 
swers to general questions about charter school per- 



The Geographic Scope of Studies and Quality Rankings for the Research Methods 
Adopted in Research Produced Between 2001 and 2005 




National Multiple Single District School or TOTAL 

States State Set of Schools 



□ Very Good H Good □ Fair H Poor 





5 



formance. However, the lack of consistent test data 
and detailed information about students and schools 
limits the value of such studies. Existing databases 
like the National Assessment of Educational Progress 
(NAEP) provide only one-year snapshots of student 
achievement and provide too little information about 
students and schools to permit sharp comparisons of 
like with like. 

Local or regional studies are much better positioned 
to incorporate institutional details, and to use a com- 
mon test instrument across schools that also happens 
to be aligned with a particular state’s content stan- 
dards. However, given the great differences among 
charter schools in different states, it is important that 
researchers not extrapolate the results of their local 
studies to charter schools in other states. In fact, there 
is some evidence from our review that charter school 
effectiveness — and the effectiveness of regular public 
schools — varies from one state to another. 

It is not realistic to hold back research until every 
potentially relevant comparison can be made. Given 
that both national and local studies have different 
strengths and weaknesses, it seems clear that these 
two broadly defined research methods complement 
each other. In short, we need both types of studies. 
We also need authors throughout this literature to 
write forthrightly about the strengths and weaknesses 
of their particular research design. 

This White Paper provides guidelines for the 
improvement of studies done at different levels — 
local, within state, and national. 

IMPLICATIONS FOR STATES, 
LOCALITIES, RESEARCH 
FUNDERS, AND MEDIA 

States . Many states have sought to assess charter 
schools and other educational innovations in the ab- 
sence of the data required for sound analysis. Only a 
few states keep student records that allow research- 



ers to follow students as they move between charter 
and non-charter schools. Some states are now trying 
to create appropriate databases. In the meantime, it 
might be possible to draw sound judgments about 
charter schools based on records kept by the big ur- 
ban school districts, which are home to the majority 
of charter schools. 

One important change to states’ charter school laws 
would be to require each charter school to provide to 
the district or other chartering authority an annual 
list of lottery participants for each grade, along with 
information on which students won and lost the lot- 
teries, and which actually enrolled. Such a reform 
would reduce temptations for schools to manipulate 
the lottery. It would also make it far simpler for re- 
searchers to conduct experimental studies of the im- 
pact of charter schools on student achievement. 

Funders . Serious research on charter school out- 
comes will not happen unless foundations and state 
and federal governments fund it. We urge funders to 
support charter school outcome studies that: 

► Include multiple years’ test results on all stu- 
dents; 

► Have good demographic data on students, 
which allows simultaneous controls on fac- 
tors known to affect student achievement, 
like native language, race, special education 
needs, family income, and parents’ education; 

► Include information about the schools in 
their sample. For charter schools, include 
how long the schools have been open and 
how long they have provided the grade level 
being tested. 

Media. In any scientific field, media coverage often 
oversimplifies the results of research. Qualifications 
and conditional statements, which researchers must 
make in order to represent their findings accurately, 
fall by the wayside when newspapers and electronic 
media report them. 



6 



Editorial and headline writers need to ask whether 
particular studies warrant the strong policy conclu- 
sions they — and sometimes the authors — would like 
to suggest. We hope this White Paper can serve as 
a guide for future reporting and policy discussions 
about charter school effectiveness. 



7 



STUDYING CHARTER SCHOOLS & ACHIEVEMENT 



CHARTER SCHOOL 
RESEARCH: WHAT ARE THE 
KEY POLICY QUESTIONS? 

harter schools are a key new addi- 
tion to the public school system in 
the United States. According to a 
National Charter School Research 
Project (NCSRP) survey, 3,403 
charter schools served over 900,000 students dur- 
ing the 2004-2005 school year. 2 As of January 2006, 
40 states and the District of Columbia have passed 
charter school legislation. Charter schools receive 
exemptions from much of their states’ education 
codes, in the hope that they will provide a wider 
range of high-quality educational experiences for 
public school students. The hope for charter schools 
is threefold. First, proponents argue that charter 
schools will directly boost the academic achieve- 
ment of attendees. 3 Second, many argue that charter 



2. Todd Ziebarth, Mary Beth Celio, Robin J. Lake, and Lydia 
Rainey, “The Charter Schools Landscape in 2005,” in Hopes, Fears, and 
Reality: A Balanced Look at American Charter Schools in 2005, ed. Robin J. 
Lake and Paul T. Hill (Seattle: Center on Reinventing Public Education, 
2005), p. 3. 

3. One theory for why charters may perform better is that they 
can function more efficiently when freed from state and district regula- 
tions and bureaucracy. Some charter school advocates make a different 
claim — that charter schools can create a haven for disadvantaged students 
in which they can more confidently build their skills and social capital. 



schools will be more cost effective, for instance by 
boosting academic achievement more than regular 
public schools but at the same cost per student, or by 
equaling the achievement gains produced by regular 
schools but at a lower cost per student. Third, pro- 
ponents argue that competition among schools for 
students will force all schools, charters and regular 
public schools alike, to improve the quality of educa- 
tion they provide, for fear of losing students. Oppo- 
nents express a number of fears. Perhaps foremost 
among these is that charter schools will undermine 
the idea of the common school — the melting pot 
of common educational experiences that underlies 
the public school system in the United States. A 
closely related concern is that the spread of charter 
schools could lead to decreased integration along 
racial, ethnic, or socioeconomic lines. 

These hopes and concerns suggest a research agenda 
along five lines: the direct effect of charter schools on 
achievement, the relative cost effectiveness of char- 
ter schools, the competitive effect on other schools, 
the possible divergence from a common education- 
al experience, and effects on integration. Although 
all five of these issues are of first-order importance, 
progress to date has been the greatest on the first is- 
sue, whether and how charter schools boost academic 
achievement of their enrollees. The second question, 
cost effectiveness, requires evidence on both academic 
outcomes and cost, and cannot move forward in a 
convincing fashion until we can validly assess the ef- 
fect of charters on their students’ achievement. Some 
work has been done on the competition question, and 




9 



the Consensus Panel plans to address it in the future. 
At present, the work in this field is limited and faces 
some major difficulties in identifying the geographic 
scope of the competitive effects. Work on the effect 
of charters on integration and on the differences in 
educational experience across schools has just begun. 

Given the primary importance of the direct effect of 
charter schools on achievement, both in terms of work 
completed to date and policy urgency, this White Pa- 
per seeks to outline the key methodological issues in 
this work. There are several reasons why such an over- 
view is timely. From a research perspective, the pace 
of new publications on charters and achievement has 
quickened dramatically in the last few years, but the 
quality of research designs used has remained decid- 
edly mixed. Therefore, we urgently need to develop 
a national consensus on better research designs and 
encourage all researchers to write openly about the 
strengths and weaknesses of their particular studies. 
For state and local policymakers, who both consume 
education research and write requests for proposals 
for new research, now is a good time for a guide to 
help them sift through existing research and develop 
requests for proposals that ask answerable questions 
and request appropriate research designs. Similarly, 
the public is currently reading more and more stories 
in the media about charter schools and achievement. 
Because the quality of media coverage varies dramati- 
cally, reporters, editors, and the public need a much 
better understanding of what various research designs 
can and cannot do. 

To date most research on charter schools and student 
outcomes has asked whether attending a charter school 
affects a student’s test scores. There are related ques- 
tions that have not been studied extensively to date, 
such as whether attending a charter school affects a 
student’s chances of graduating from high school and 
enrolling in and graduating from college, or whether 
there is any link between charter attendance and the 
wages of graduates years after they leave school. So- 
ciety probably cares more about these outcomes than 
test scores, but relevant data are scant. 



An even more embryonic line of research asks what 
distinguishes effective from less effective charter 
schools. For this analysis, researchers need to have a 
convincing way of identifying the impact of a given 
charter school on student performance, as well as in- 
formation on key factors that distinguish one charter 
school from another. 

Answering such questions requires rich and accurate 
data accompanied by convincing analytical methods. 
There has been a great deal of public controversy sur- 
rounding the research completed to date; much of the 
disagreement stems from issues related to data qual- 
ity and the quality of designs and statistical inference 
methods used. 

RESEARCHERS WILL HAVE 
TO USE MULTIPLE RESEARCH 
APPROACHES TO LEARN 
THE IMPACT OF CHARTER 
SCHOOLS 

There is no single method, and no single study, that 
can convincingly tell policymakers all that they need 
to know about the impact of charter schools on 
student learning. Some have argued for the use of 
experimental evaluations, while others note that ex- 
periments solve some problems while potentially cre- 
ating others. Instead, this argument goes, we should 
use non-experimental, observational studies that track 
students’ progress over time as they transfer between 
public and charter schools. 

Some have argued for large-scale national studies, 
while others have argued for a multiplicity of well- 
formulated local or regional studies. We will argue 
that each kind of study has strengths and weaknesses 
that need to be carefully weighed in light of the ques- 
tions policymakers need answered. 

Regardless of the study, it can be judged by two gen- 
eral criteria. The first is whether the study credibly 



10 



establishes that charter schools caused a difference in 
students’ outcomes. To make such a determination, 
we need two pieces of information: (1) how students 
fared in charter schools, and (2) how the same stu- 
dents would have fared, had they instead attended 
regular public schools. The difference between the 
two provides a good estimate of charter school effects 
on student outcomes. The first is easy enough to ob- 
tain: we simply measure the outcomes of students at- 
tending charter schools. The second, referred to as the 
counterfactual, is much harder to obtain. Of course, 
we cannot observe that student in a given school year 
and grade simultaneously attending a regular public 
school and a charter school. Instead, we must ap- 
proximate the counterfactual. One method is to use 
a “comparison group” of different students attending 
public schools. (This comparison group of students 
not attending charters is sometimes also referred to 
as a “control group.”) Alternatively, we can compare 
individual students’ performance in the years before 
and after entering a charter school, so that each stu- 
dent becomes her own control. Much of the charter 
school debate has raged over such control groups, and 
whether they really allow researchers to make strong 
statements about the causal effect of attending a char- 
ter school. When they do, the study is said to possess 
internal validity. 

But this is only part of the story. Even if a study has 
high internal validity, we must judge it by a second 
criterion — whether its results can be usefully general- 
ized to charter schools in general. The term “charter 
school” connotes a privately-managed and publicly- 
funded school operated through an agreement with a 
state, district, or other chartering authority. Beyond 
that simple definition, there is a great deal of varia- 
tion across states, communities, and school districts 
in charter schools and the students that they enroll 
(as, indeed, there is variance in public schools and 
their students!). To judge whether a charter school 
study — even an internally valid one — is generalizable 
to contexts other than the one in which it was con- 
ducted, one must ask pointed questions: Are the laws 
governing the management and funding of charter 
schools similar? Are the schools and their communi- 
ties similar? Are the students similar? If these and 



other questions can be answered in the affirmative, 
then the study is said to possess external validity. 

Non-experimental studies 

Most charter school studies compare the outcomes of 
charter school students to students currently attend- 
ing regular public schools. In these studies, the essen- 
tial question is whether regular public school students 
provide a close approximation to the counterfactual. 
In other words, do their outcomes indicate how char- 
ter school students would have fared, had they instead 
chosen to attend regular public schools? 

The immediate risk in this non-experimental, or “ob- 
servational,” approach is obvious: any comparison of 
students who attend charters with those who do not 
risks comparing apples and oranges, because of un- 
observed differences between students in these two 
groups. For example, the students may differ in their 
home educational environments, parental motivation, 
or specific educational histories in ways that are dif- 
ficult to measure. If students self-select into charter 
schools based on such personal characteristics, then 
we are unlikely to obtain accurate, or “unbiased,” esti- 
mates of the causal effect of attending a charter school. 
Results will be distorted by what social scientists refer 
to as “selectivity bias.” This form of bias is potentially 
severe because it risks misconstruing differences in 
students’ outcomes — really caused by unobserved dif- 
ferences in family background, environment, or per- 
sonal traits — as being caused by charter schools. As 
a result, most researchers use statistical controls for 
student characteristics, in order to better approximate 
an “apples to apples” comparison. However, there 
remains the distinct possibility that students differ 
in ways not recorded in the data available to the re- 
searcher, and that therefore cannot be controlled even 
with the best methods. 

Given this likelihood, much of the non-experimen- 
tal research on charter schools should be interpreted 
with a healthy skepticism. In fact, it suggests that sev- 
eral non-experimental research approaches should be 
avoided altogether. 



11 



1) Many school-level analyses compare average 
student test scores across schools at a single 
point in time , with either no or weak controls 
for student characteristics. This is perhaps 
the worst research design available, because 
this approach ignores the possibility that 
scores measured in one year were caused by 
students’ schooling experiences in previous 
years, and the possible effects, both positive 
and negative, of student self-selection into 
charter schools. 

2 ) Some school-level analyses compare trends 
over time in average student test scores across 
charter and regular public schools, with few 
controls for student characteristics. This is 
also a weak research design. It improves on 
the first approach, since it measures learning 
that occurred in the year of interest. How- 
ever, this approach will usually fail to control 
adequately for changes in school-level test 
scores that merely reflect changes over time 
in the composition of the student body. For 
example, suppose that for some random rea- 
son the students who leave charters in year 
two of a two-year study are those with the 
lowest test scores. When researchers calcu- 
late average test scores for each school, they 
may incorrectly interpret the rising average 
scores at the charter schools in year two as 
evidence that the charters are boosting indi- 
vidual students’ achievement. Changes in the 
demographic composition of charter schools 
are quite likely, although it is hard to predict 
in which direction these changes might go. 
After all, charter schools typically enroll stu- 
dents from across a school district or com- 
munity, and so enrollment can change more 
quickly than the local neighborhood. We 
also note that the school choice provisions of 
No Child heft Behind (NCLB) are likely to in- 
crease movements of students among schools 
in future years. In addition, NCLB calls for 
conversion of low-performing schools into 
charter schools. Presumably many students 
with low test scores will be moving into char- 
ter schools, a phenomenon that could nega- 



tively bias the results of studies using this 
method. 

3) A minor improvement is a student-level com- 
parison of test scores at a single moment in 
time between students in charters and regu- 
lar public schools that controls for observ- 
able characteristics. This is a better but still 
weak research design. It is better because 
it attempts to link an individual student’s 
achievement to his or her own observed 
background characteristics. However, this 
research design ignores the history of each 
student, and ignores differences in rates of 
learning between one student and another. 

We would not put much weight on studies 
with the above designs. Method 4, described 
below, represents a definite improvement. 

4 ) There are a growing number of student-level 
analyses of trends over time in student test 
scores that control for individual student 
characteristics. This represents a far better 
research design, because it takes into account 
where a student began on the achievement 
spectrum and controls for observable student 
characteristics. However, there remains a 
risk that a lack of proper controls for unob- 
served characteristics of each student make 
comparisons between students at charters and 
regular public schools potentially misleading. 

Student-level approaches, such as the 
fourth method described here, attempt 
to explain growth in students’ achieve- 
ment over time. Such models are gener- 
ally referred to as “value-added” models. 
There are in fact several variants of these. 4 



4. In one approach, researchers attempt to explain gains in test 
scores as a function of characteristics of the student’s educational experi- 
ence in the given year. In a closely related variant, researchers instead 
model the level of a student’s test score in a given year as a function of one 
or more prior test scores and measures of the student’s educational experi- 
ence that year. These models are more general versions of value-added 
models because they allow for the impact of past educational experiences 
to affect test scores, but perhaps with some “forgetting” by the student of 
what he or she had learned in the past. 



12 



Finally, two methods present what many 
consider best practice for non-experimental 
studies of this type. Each method takes ad- 
ditional steps — in the form of statistical tech- 
niques — to control for differences between 
students attending charter and regular public 
schools that threaten internal validity. 

5)A fixed-effects analysis controls for any unob- 
served differences among students that are 
constant across time, via the statistical con- 
trols for a “fixed effect” or a variable indicat- 
ing each student. (If we imagine a graph of 
test scores plotted against time for several 
students, we attempt to fit a line through the 
data, but allow “fixed effects,” which means 
separate starting points for each student, to 
allow for unobserved differences in students’ 
prior learning.) This method can be used 
only when researchers have multiple years of 
test score data for each student, and when at 
least some students switch between charter 
and regular public schools. (The fixed-ef- 
fect method instead focuses on students who 
switch into and/or out of charter schools be- 
cause only for these students do we have the 
ability to compare their achievement growth 
in both charters and regular public schools.) 
The key advantage of fixed-effect models is 
that they remove the need to compare apples 
to oranges (i.e., students in charter schools 
versus those students who remain in regu- 
lar public schools). Instead, they compare 
an individual student’s gains in achievement 
in years she is in a charter school with years 
in which she is not. Each student then be- 
comes his or her own comparison group. 

There are two key potential weaknesses of the 
fixed-effect method. First, it controls only for 
unobserved characteristics of students that do 
not change over time. We cannot know for 
sure that if a student had remained in regu- 
lar public schools that his or her test score 
growth would have continued as it had be- 
fore he or she switched. Second, the fixed- 



effect method virtually ignores students who 
never enter charter schools or those who al- 
ways attend charter schools during the pe- 
riod under study. If, as seems highly likely, 
students who switch into or out of charter 
schools differ in some unobserved way from 
non-switchers, then it is unlikely that we can 
extrapolate the results from the fixed-effect 
study to these other students. This is an ex- 
ample of limitations on external validity. 5 

A closely related method to fixed effects is 
hierarchical linear modeling (HLM), which 
allows for separate intercepts for differ- 
ent groups (e.g., all the students in a given 
school) and in some formulations also allows 
for the effects of explanatory variables to dif- 
fer by group as well. If the model allows sep- 
arate intercepts for each student, the results 
of studies based on HLM closely resemble 
those of studies based on student fixed-effect 
models. 

6)Two related techniques, the propensity score 
analysis and Heckman selectivity correction 
model , tackle the problem that children en- 
rolled in charter schools may vary systemati- 
cally from those who remain in regular public 
schools. These two methods use slightly dif- 
ferent approaches, but both attempt to remove 
the so-called selectivity bias from comparing 
apples and oranges. We describe each briefly. 

6a) A propensity score analysis attempts to 
match charter school students with stu- 
dents in regular public schools who, based 
on observable characteristics, have a similar 
likelihood, or “propensity,” to attend charter 
schools. An advantage of this method is that 



5. Technically, it is not quite right to claim that a fixed-effect 
model completely ignores students who never switch into or out of char- 
ter schools. Such students do contribute to the estimated effect of other 
variables that do change over time for those students. For instance, if the 
researchers allow gains in test scores to vary across grades (e.g., grades 
2 and 3), both switchers and non-switchers contribute to the estimated 
variations in gains between grades. 



13 



it provides at least a plausible method for eval- 
uating the impact of charter schools on those 
students who never switch back and forth 
between charter and regular public schools. 
Fixed-effect models cannot do this because 
they estimate the effect of attending a charter 
by comparing test score growth before and 
after a switch. Nevertheless, the propensity 
score approach is only as good as the observ- 
able characteristics used to estimate the “pro- 
pensity” to attend a given school type. It is 
still possible, indeed likely, that unobserved 
differences remain between the two groups. 

6b) A closely related approach involves the 
Heckman selectivity correction model. As in 
propensity score analysis, the Heckman se- 
lectivity correction begins by modeling who 
attends charters and who attends regular 
public schools. The second step of the Heck- 
man procedure then estimates and attempts 
to remove all selectivity bias, leaving an unbi- 
ased causal estimate of the impact of attend- 
ing a charter school. 

The last two methods share a crucial weakness. They 
assume that there is “selection on observables,” mean- 
ing that the researcher has information on all of the 
variables that determine whether a given student de- 
cides to enter a charter or a regular public school. If 
this is not true, some bias will remain in the estimated 
effects of attending a charter. 

Thus far, the discussion has mainly emphasized con- 
cerns about internal validity. However, both the fixed- 
effect and propensity score models potentially have 
greater external validity, relative to the experimental 
methods to be discussed below, because they often 
use large-scale data drawn from many regular public 
and charters schools in a particular state. Thus, they 
can incorporate a large share of charter schools and 
students in the study. Nevertheless, they are likely to 
have less than perfect external validity, and often need 
to be judged on a case-by-case basis. For instance, 
as already mentioned above, the results from fixed- 
effects studies can be most easily generalized to stu- 



dents that switch between regular public and charter 
schools. It is often uncertain whether such results can 
be usefully generalized to students that spend their 
entire school career in one school type or another. 



14 



:iMII II II I II II III II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II I II II II I II II III II II I II II III II II 1^ 



1 AN EXAMPLE OF THE 
1 POTENTIAL WEAKNESSES 
1 OF SCHOOL-LEVEL STUDIES 
| RELATIVE TO STUDENT 
I VALUE-ADDED MODELS 



A n example with fictitious data 
illustrates the pitfalls awaiting 
researchers who decide to study 
school averages rather than indi- 
vidual students’ gains in achieve- 
ment. Suppose we have test score data on four 
students over multiple years. In each grade, we 
have information on the student’s percentile 
ranking versus a national sample of students. So 
a student with a score of 75 ranks better than 75 
percent of students nationally, while a student 
with a score of 30 ranks above only 30 percent of 
students nationally. In the district we are study- 
ing, regular public schools succeed in boosting 
achievement at the same rate as schools nationally, 
so that individual students’ percentile rankings are 
constant over time. In contrast, charter schools 
in the sample boost students’ achievement much 
more quickly than do schools nationally, so that 
all charter school students improve their percen- 
tile ranking by two points per year. 6 



= Table 1 tells this story quite clearly. Each cell shows 
E test-score gains for four students by year. The un- 
= shaded cells indicate years in which each student 
| enrolled in a regular public school, and shaded cells 
= show years in which each student enrolled in a char- 
= ter school. Because student C arrived in the district 
| only in 2005, we have no test-score gains for this 
= student. 



E The patterns that emerge from our simple compari- 



= 6. In practice researchers rarely use percentile rankings, instead 

= typically using psychometrically scaled scores, but we use percentile 
= rankings here to simplify the presentation of the key insights. 



son of value added, that is, gains in student achieve- 
ment, give an accurate portrayal of the causal effect 
of charter schools. Each student gains 0 percentile 
points per year in regular public school but gains 2 
points in a charter school. Charter schools, in our 
example, are clearly doing a better job. (It would be 
easy to reverse this assumption — the point we are 
making here is how easy it is for certain research 
methods to obscure the truth.) 



TABLE 1: Example of a Value-Added Dataset with 
Students’ Percentile Rankings by Year 



GAINS IN TEST SCORES 


H 


YEAR 


2002 


2003 


2004 


2005 


= 


Student A 




0 


0 


0 


0 


= 


Student B 




0 


2 


2 


2 


~ 


Student C 












E 


Student D 




0 


0 


0 


2 


1 


AVERAGE TEST SCORE GAINS BY INDIVIDUAL STUDENTS 


E 


Charters 


2 










i 


Regular 
Public Schools 


0 










| 


Note: Shaded cells indicate years ir 


i which the student 


was in a charter school | 


~ 



Now, let’s take a step backwards from this table 
in order to show what can go wrong with simple 
school-level analyses. Table 2 shows the actual test 
scores in each school year, which generated the gains 
in test scores we presented in Table 1. 



TABLE 2: Achievement Levels of Each Student in 
Table 1 by Year and Grade 



ACTUAL TEST SCORES 


= 


YEAR 


2001 


2002 


2003 




2005 


E 


GRADE 


2 


3 


4 


5 


6 


E 


Student A 


75 


75 


75 


75 


75 


E 


Student B 


36 


36 


38 


40 


42 


~ 


Student C 










10 


| 


Student D 


40 


40 


40 


40 


42 


1 


AVERAGE TEST SCORES BY YEAR 


■ 




1 


Charters 






38.0 


40.0 


31.3 


~ 


Regular 
Public Schools 


50.3 


50.3 


57.5 


57.5 


75.0 


E 



Note: Shaded cells indicate years in which the student was in a charter school. 



15 







±1 II II I II II III II II I II II III II II I II II III II II I II II II I II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II It 



This cut at the data reveals huge variations among 
students in their level of achievement. Student A 
is the highest- scoring student, perhaps because she 
comes from a home with highly educated and afflu- 
ent parents. This student remains in regular public 
schools throughout our study. Student B has far 
lower achievement, and after two years in public 
schools switches to a charter school, where his test 
scores begin to improve because of the quality of in- 
struction offered by the charter. Student C is new to 
the district. She has extraordinarily low test scores, 
ranking higher than only 10 percent of students na- 
tionwide, perhaps because she is Limited English 
Proficient. Her parents opt for a charter school in 
2005. Finally, student D has fairly low test scores 
that do not budge while he is enrolled in a regular 
public school. However, in 2005 he switches to a 
charter school. (Or, perhaps, the district responds 
to NCLB requirements by converting his low-per- 
forming school to a charter.) 



We can ask the question, could researchers get a 
valid answer about charter school achievement 
gains if they used a school-level analysis (Method 
1 on page 12)? Figure 1 shows average test scores 
by year for all students in regular public schools 
and charter schools. A researcher who simply com- 
pared a snapshot of average achievement in char- 
ters and regular public schools would find 2005 test 
scores averaging 75 for regular public schools and 
31.3 for charters, and might incorrectly conclude 
that charter schools were “failing.” If particularly 
na'ive, the researcher might even conclude that “on 
average, charter schools are not even half as good 
as regular public schools.” We know that both of 
these statements are completely incorrect, because 
in our made-up example, charter schools manage to 
boost students’ national percentile rankings, while 
regular public schools merely maintain students’ 
rankings. 



= FI G U R E 1 : Misleading Trends in Average Achievement in Charter and Regular Public Schools Based on 

= Average Scores 




~TI II I II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II II III II II I II II III II II I II II III II II I II II II I II Mil - 



16 



4111 ii iiiii ii 111 ii ii i ii ii ii iiiii 111 ii ii i ii ii 111 ii iiiii ii 111 ii ii i ii ii ii i ii ii i ii ii ii i ii ii 111 ii mu ii iii 

= What about method 2 on page 12, in which re- 

E searchers calculate average test scores by year and 

| compare trends for charter and regular public 

= schools? Figure 1 shows that this method also 

| leads to highly incorrect conclusions. One might 

= incorrectly infer from Figure 1 that, over time, the 

E quality of teaching in regular public schools had 

| improved, while the quality of teaching in char- 

E ter schools had fallen quite dramatically. But we 

E have already seen the underlying data and know 

= both conclusions are inaccurate. Instead, changes 

E in the composition of the students at the two types 

= of schools drive both of these trends. Three types 

E of compositional change have occurred. First, one 

| relatively low-scoring student (B) left regular public 

= school for a charter school, making charters look 

E like their quality dropped and regular schools look 

| like they had improved. Second, a low-scoring stu- 

E dent who was new to the district decided to attend 

| a charter school, which makes it look like charter 

= school quality plummeted in 2005. Third, a low- 

E performing regular public school was converted into 

| a charter school. All three of these compositional 

= effects contribute toward the erroneous impressions 

E that charters were becoming less effective relative to 

= regular public schools over time. 

| Does a switch to analysis at the student level fix 
E things? Method 3 on page 12, which involves ex- 
| amination of the level of achievement of individual 
| students, represents only a very minor improvement. 

E For instance, if researchers merely examined the 
= level of student test scores in 2005, they might in- 
E correctly infer that charter schools caused their stu- 
dents’ performance to lag behind. Such researchers 
= might reduce this bias somewhat by controlling for 
E the characteristics of individual students. However, 

| this is unlikely to completely correct the problem. 

E Method 4, the student-level value-added approach, 

= represents a huge step forward in allowing correct 
E interpretation of the data. Researchers using this 
| method would amass data on student gains exact- 

E ly as shown in Table 1, and would correctly infer 



ii i ii ii in ii ii i ii ii ii i ii ii in ii ii i ii ii in ii ii i ii ii in ii ii i ii ii ii i ii ii i ii ii ii i ii ii in ii ii i ii ii in ii ii it 

that attending a charter school causes a student to = 

gain two percentile points per year, while attend- E 

ing a regular public school would merely maintain a = 
student’s percentile ranking. 7 E 



7. It is reasonable to ask whether method 5, student fixed -effects, 
and methods 6a and 6b, which attempt to correct for selectivity bias, 
are either necessary or sufficient for making the correct inference in our 
example above. Fixed effects would have generated the “correct” answer 
that charter schools boost achievement by two percentile points per 
year relative to the regular public schools. But simply modeling gains 
in achievement was all we really needed in our example. In addition, 
these fixed-effect models could also have handled more complicated 
and realistic situations in which students vary in their average rate of 
gain in achievement, regardless of school setting. In our simple ex- 
ample we assumed, for instance, that all students would have gained 
0 points per year in a regular public school. In reality, average gains 
might have been zero but with considerable heterogeneity among stu- 
dents. Ignoring these possibilities could have biased our estimates of 
charter school effects up or down, in an unpredictable way. Student 
fixed-effects would have removed any biases due to such heterogeneity. 
Finally, the two methods of correcting for selectivity bias might have 
helped reduce biases in method 3, in which we modeled individual stu- 
dents’ levels of test scores, but only to the extent that researchers had 
information on student characteristics that could have accurately pre- 
dicted how students sorted into charters and regular public schools. 



17 



Lotteries and randomized experiments 



Even very sophisticated non-experimental studies 
cannot provide a guarantee that they are conducting 
the sought-after “apples to apples” comparison. To 
obtain such a comparison, generations of social sci- 
entists have relied upon randomized experiments in 
which participation in a treatment — such as a charter 
school — is determined not by the choices of individ- 
ual schools and students, but by the flip of a coin. A 
hypothetical charter school experiment might begin 
with a group of 600 students, 300 of whom are ran- 
domly chosen to attend a charter school. We would 
not anticipate any systematic difference between the 
two groups, other than the school attended. Com- 
parisons of test-score differences would provide an 
internally valid estimate of causal effect of attending a 
charter school. This approach has become the basis of 
much medical research, for instance when new drugs 
are tested. 

However, it would be difficult to implement a true 
charter school experiment, and even if one were run 
successfully, the results would still require careful in- 
terpretation. Implementation issues abound. It would 
be hard if not impossible to conduct the experiment 
just described. There are ethical questions about 
random assignment; moreover, most people would 
consider conscious family choice, and the resulting 
relationships among parents, school, and children, to 
be an essential part of the charter school experience. 
Thus, students who were randomly assigned to a char- 
ter school rather than choosing it might not experi- 
ence the same “treatment” as students who chose the 
same school. 

The difficulty of interpreting a randomized study is 
also often overlooked: it is unlike a medical treat- 
ment or a very specific educational intervention that 
is sharply defined and easily distinguished from other 
interventions. (Success for All, a very disciplined and 
distinctive instructional program, is an example of 
such a well-defined “treatment.") Attending a charter 
school is a much more diffuse treatment, such that 
children attending two charter schools might have 
very different experiences. Moreover, students in some 



district-run public schools can have many of the same 
instructional experiences as students in some charters. 
If the whole point of the charter school movement 
is to allow these schools greater flexibility and to en- 
courage innovation and diversity, it is hard to know 
what it means to estimate the “average effect” of at- 
tending a charter school. We can indeed attempt to 
estimate this number, but in reality we should expect 
a great deal of heterogeneity among charter schools. 
This issue of heterogeneous “treatment” applies to 
both observational and experimental studies, but the 
issue becomes more obvious when we discuss it in the 
context of experiments. 

There is a good substitute for a pure experiment with 
random assignment, which is a natural outgrowth of 
charter school laws. Most laws require charter schools 
to admit students via a lottery, if the school receives 
more applicants than available seats. This “quasi-ex- 
periment” provides a ready control group: students 
that were randomly denied admission to the charter 
school. Unobserved factors like motivation, family 
background, and support from the family should on 
average be identical between charter applicants who 
win and lose the lottery. Thus, most lottery studies 
provide excellent internal validity. 8 

However, lotteries can introduce new forms of selec- 
tivity bias that threaten the generalizability, or exter- 
nal validity, of studies that use them. 

First, a lottery study reveals nothing about students 
in the many charter schools that did not receive more 
applications than they had seats available. For exam- 
ple, if we make the common sense assumption that 
the best charter schools are the most likely to receive 
more applications than they have seats, this subset 
of “oversubscribed” charters will be above average in 
quality. 



8. Researchers often use the term “quasi-experiment” for situa- 
tions like the admissions lotteries we describe here. They are not true 
experiments in which a social scientist would randomly assign students to 
charter schools or regular public schools, but ideally lotteries do succeed 
in randomizing students into or out of charter schools based on lottery 
results. 



18 



Second, some charter schools hold multiple lotteries 
by grade or for students living in different neighbor- 
hoods. It is quite likely such schools will have more 
applicants than seats available only for some grades or 
for students from particular neighborhoods. If, for in- 
stance, a charter school wants to serve students from 
several neighborhoods but gets extra applications from 
only a few neighborhoods, then the lottery samples 
can provide results only for students from these areas. 
The students for whom lottery-based comparisons 
are possible could differ in important (and unknown) 
ways from students who come from neighborhoods 
with no wait list for admission. 

There is a third important way in which lotteries may 
not provide a generalizable estimate of the impact on 
student achievement. Suppose that a policymaker re- 
ally wants to know what would be the overall effect 
on student achievement if all schools, rather than the 
current five percent, were to be operated as charters. 
It is impossible to answer this question by studying 
only the small fraction of charter schools that have 
useable lottery data. At present in the United States, 
only a small percentage of students choose to apply to 
charter schools. It is likely that they are quite unrep- 
resentative of public school students generally, both 
in terms of observable characteristics such as race and 
ethnicity, and in terms of unobservable — but crucially 
important — characteristics such as motivation, innate 
ability, and the degree of family support for switch- 
ing schools. If even more students were to apply to 
charter schools, it is uncertain whether currently ob- 
served effects — either positive or negative — would be 
duplicated. 

These three types of selectivity bias — the potentially 
unrepresentative nature of the subset of charter schools 
that perform lotteries, the potentially unrepresenta- 
tive nature of the subset of students within a given 
school who had to win a lottery to gain admission, 
and the self-selection of students into charter schools 
more generally — raise important concerns about the 
overall external validity of lottery-based estimates of 
charter school effects. 



A fourth problem of the lottery method is that it does 
not take account of the fact that many families denied 
admission to one school of choice continue applying 
until they get admitted to another one. This form of 
bias, known as substitution bias, is potentially serious 
because the lottery analysis may, in extreme situations, 
wrongly suggest that charter schools have no effect 
on student learning, when in truth lottery losers sim- 
ply choose to attend another equally good school of 
choice. In the extreme, all the “comparison” students 
for one charter school could be enrolled in some other 
charter school. 

A fifth potential problem is that some school opera- 
tors could be tempted to conduct a lottery in name 
only, giving preferences to certain types of students. 
It is therefore incumbent upon researchers to verify 
that lottery winners and losers have statistically iden- 
tical characteristics at the time of the lottery. If not, 
it would call into question whether a real lottery had 
occurred. This is a strong possibility when the lottery 
is conducted by the school itself rather than by a neu- 
tral entity. A lottery that is not open to the public 
could also raise red flags about whether it was a “real” 
lottery. 

A sixth issue related to admission lotteries is that 
not all lottery winners will choose to attend a charter 
school. So, although it is straightforward to estimate 
the impact of “winning a lottery,” it is more difficult 
to assess the average effect of sending an applicant to 
a charter school because those lottery winners who 
choose to attend charters may differ in important 
ways from those who decide not to do so. This is not a 
fatal problem, but the truth is that most policymakers 
would like to know the impact of “attending a char- 
ter,” rather than the impact of “winning a lottery to 
attend a charter .” 9 



9. For excellent reviews of the strengths and weaknesses of experi- 
mental and quasi- experimental evaluations, in the context of studies of the 
impact of government training programs, see James Heckman and Jeffrey 
Smith, “Assessing the Case for Social Experiments,” journal of Economic 
Perspectives , Spring 1995, 9(2), pp. 85-110, and James Heckman, Robert 
LaLonde, and Jeff Smith, "The Economics and Econometrics of Active 
Labor Market Programs," in Handbook of Labor Economics , Vol. 3 A, ed. 
O. Ashenfelter and D. Card, 1865-2097 (Amsterdam: North- Holland, 
1999). 



19 



